---
title: "The MAVERICK Dataset Codebook Version 1.0"
author:
  - name: Sebastian van Baalen \orcidlink{0000-0003-3098-5587}
    affiliation:
      - Department of Peace and Conflict Research
      - Uppsala University
  - name: David Edberg Landeström
    affiliation:
      - Department of Peace and Conflict Research
      - Uppsala University
  - name: Tor Richardson-Golinski
    affiliation:
      - Department of Peace and Conflict Research
      - Uppsala University
  - name: Kristine Höglund \orcidlink{0000-0001-7167-609X}
    affiliation:
      - Department of Peace and Conflict Research
      - Uppsala University
date: "`r format(Sys.Date(), '%B %d, %Y')`"
abstract: "This codebook introduces the Modes and Agents of Election-Related Violence in Côte d'Ivoire and Kenya (MAVERICK) dataset. MAVERICK is a georeferenced event report level dataset of electoral violence from the (re)introduction of multiparty elections in the 1990s to 2022 in Côte d'Ivoire and Kenya. The codebook sets out the operational definitions that guided the construction of the dataset, outlines the coding rules and criteria, describes the variables, discusses limitations and recommendations, and provides example code for how to work with the data."
format:
  pdf:
    latex-engine: xelatex
    include-in-header:
      text: |
        \usepackage{lipsum}
        \setcounter{secnumdepth}{3}

        %\let\oldsection\section
        %\renewcommand\section{\clearpage\oldsection}

        % Make title, section, and subsection bold
        \addtokomafont{title}{\normalfont}
        \addtokomafont{section}{\normalfont\bfseries}
        \addtokomafont{subsection}{\normalfont\bfseries}

        % Make subsubsection italic
        \addtokomafont{subsubsection}{\normalfont\bfseries}

        % Change the font in TOC to match the document's main font
        \addtokomafont{disposition}{\normalfont}

        % Change the font of 'term' in definition environment
        \let\olditem\item
        \renewcommand{\item}[1][]{\olditem[\normalfont\bfseries #1]}

        % Ensure TOC starts on a new page
        \let\oldtoc\tableofcontents
        \renewcommand{\tableofcontents}{\clearpage\oldtoc}

        \usepackage{longtable}
        \usepackage{booktabs}
        \usepackage{graphicx}
        \usepackage{adjustbox}
        \usepackage{enumitem}
        
        % Set table font size to small
        %\usepackage{etoolbox}
        %\AtBeginEnvironment{tabular}{\tiny}
        %\AtBeginEnvironment{longtable}{\tiny}
        
        % Adjust row spacing for better table readability
        %\renewcommand{\arraystretch}{1.2}
        
        % Use custom link color
        \usepackage[dvipsnames]{xcolor}
        \definecolor{lnkcolor}{cmyk}{0.77,0.36,0.48,0.10}
        
        % Enable ORCIDs
        \usepackage{orcidlink}
        
    toc: true
    toc-depth: 3
    embed-resources: true
    link-citations: true
    filecolor: "lnkcolor"
    urlcolor: "lnkcolor"
    linkcolor: "lnkcolor"
    citecolor: "lnkcolor"
prefer-html: true
#format:
#   html:
#     embed-resources: true
editor: visual
code-fold: true
title-block-banner: true
bibliography: /Users/sebastian/Library/CloudStorage/Dropbox/Bibliography.bib
#toc: true
#toc-depth: 3
---

```{=latex}
\clearpage
```

# How to cite the dataset {.unnumbered}

::: callout-important
## When using this data, please always cite:

> Sebastian van Baalen & Kristine Höglund (2026) Introducing the Modes and Agents of Election-Related Violence in Côte d'Ivoire and Kenya (MAVERICK) dataset. *Journal of Peace Research* (online first).

When appropriate, also cite this codebook:

> Sebastian van Baalen, David Edberg Landeström, Tor Richardson-Golinski & Kristine Höglund (2025) The MAVERICK Dataset Codebook Version 1.0. Uppsala: Department of Peace and Conflict Research, Uppsala University.

When using the `eventreport` package or the data aggregation tools developed in the package, please also cite:

> Sebastian van Baalen & Kristine Höglund (2025) Trials and Triangulations: Analyzing Aggregation Sensitivity in Event Data on Political Violence. Uppsala: Department of Peace and Conflict Research, Uppsala University.
:::

```{=latex}

\clearpage
```

# Introduction {#sec-introduction}

```{r}
#| echo: false
#| results: hide
#| warning: false
#| error: false

library(tidyverse)
library(scales)
library(kableExtra)
library(ggplot2)
library(wesanderson)
library(sf)
library(eventreport)

maverick <- maverick_event_report

maverick_event_representative <- aggregate_maverick_rep()

# Define custom plot theme

my_theme <- theme_bw() +
  theme(
    plot.margin = margin(1, 1, 1, 1, "cm"),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0), size = 10),
    axis.title.y.right = element_text(margin = margin(t = 0, r = 0, b = 0, l = 10), size = 10),
    axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0), size = 10),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    legend.position = "bottom",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 10),
    strip.text = element_text(size = 10),
    plot.title = element_text(size = 10, hjust = 0, margin = margin (t = 0, r = 0, b = 10, l = 0)),
    plot.subtitle = element_text(size = 10, hjust = 0, margin = margin (t = 0, r = 0, b = 40, l = 0)),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )

shapefile_civ <- st_read("civ_admbnda_adm1_cntig_ocha_itos_20180706/civ_admbnda_adm1_cntig_ocha_itos_20180706.shp")

shapefile_ke <- st_read("geoBoundaries-KEN-ADM1-all/geoBoundaries-KEN-ADM1.shp")

number_of_events <- maverick_event_representative %>% 
  summarise(events = n()) %>% 
  mutate(events = comma(events))
```

The Modes and Agents of Election-Related Violence in Côte d'Ivoire and Kenya (MAVERICK) dataset contains information on `r number_of_events` incidents of electoral violence from the (re)introduction of multiparty elections in the 1990s to 2022 in Côte d'Ivoire (1995-2022) and Kenya (1992-2022). MAVERICK is a georeferenced event dataset, meaning that it contains information about violent incidents related to electoral processes that took place on a specific date and in a particular location. The aim of the data collection was to generate a disaggregated and georeferenced event dataset of electoral violence incidents that contains in-depth and nuanced information on the actors initiating, perpetrating, and intervening in such violence.

There are several existing datasets on electoral violence (or that capture variables associated with electoral violence), including the Deadly Electoral Conflict (DECO) dataset [@fjeldehoglund2022], the Electoral Contention and Violence (ECAV) dataset [@daxeckeretal2019], the Countries at Risk of Electoral Violence (CREV) dataset [@birchmuchlinski2017], the National Elections across Democracy and Autocracy (NELDA) dataset [@hydemarinov2012], and the Varieties of Democracy (V-Dem) dataset [@vdem2023]. Although we draw inspiration from all these datasets, our dataset is most similar to DECO and ECAV, two other geo-referenced event datasets.

MAVERICK differs from these two datasets in several ways. First, MAVERICK is an actor-centered dataset that seeks to capture as much information as possible about the groups and individuals involved in initiating, perpetrating, and intervening in electoral violence. Thus, the dataset differs from DECO and ECAV in that it contains more information about both those groups and individuals that are directly involved in committing electoral violence. Second, MAVERICK employs a multi-party actor structure that allows for a more complex inclusion of actors than the dyadic actor structures used by DECO and ECAV. In particular, the MAVERICK data structure records data on each individual actor's characteristics and involvement in a particular event, as well as its relationship to other actors. Third, MAVERICK provides additional detail and nuance about the actors involved in electoral violence, including about actor types and subtypes, actor alliances, actor ties to political parties, and actor violence. Finally, MAVERICK contributes more detailed estimates of the consequences of electoral violence than DECO and ECAV, and contains information on violence context, violence form, the number of people killed and injured, and whether the violence caused displacement and/or property damage.

Collecting detailed data on the agents of electoral violence is time-consuming. Version 1.0 of MAVERICK hence focuses on two high-profile cases of electoral violence rather than a global sample: Côte d'Ivoire and Kenya. Côte d'Ivoire and Kenya are two of the most electoral violence-affected countries in the world [@fjeldehoglund2022; @straustaylor2012]. In addition, both countries have seen very severe outbreaks of electoral violence, with the 2010--2011 post-electoral crisis in Côte d'Ivoire resulting in an estimated 3,000 deaths and the 2007--2008 election crisis in Kenya causing an estimated 1,500 deaths [@klausmitchell2015].

There are several benefits to focusing on Côte d'Ivoire and Kenya. First, both cases have seen the involvement of a wide range of actors in electoral violence, including government and rebel forces, police, ethnic militias, social movements, political parties, and youth gangs [see e.g. @klaus2020b; @klopp2001; @mutahiruteere2019; @straus2011; @vanbaalengbala2023]. Second, electoral violence in both Côte d'Ivoire and Kenya has taken place both before, during, after, and in between electoral periods, which allows us to examine how actor involvement changes over time. Third, and of particular interest to the data collection, both cases are well-documented and have seen several retrospective investigations into electoral violence, which facilitates the collection of more detailed actor information than is currently systematized. Finally, both Côte d'Ivoire and Kenya constitute paradigmatic cases in the academic study of electoral violence, which makes it important to revisit the historical record with an actor-centered analytical lens.

The MAVERICK dataset can help address a number of important research questions about the causes, dynamics, and consequences of electoral violence. Regarding the causes of violence, MAVERICK's multi-partite actor structure allows researchers to test meso-level theories about the actors involved in violence, theoretical expectations that are often implicit in existing theories and left untested due to a lack of data. In addition, given that MAVERICK records what role each actor played in each event, and how multiple actors related to one another, the dataset can help answer questions concerning the co-production of electoral violence. Regarding the dynamics of violence, MAVERICK contains several variables that enable scholars to answer questions about changes over time and space, such as whether a particular actor's repertoire of violence varies across localities and elections, or why some actors cause many injuries but few deaths, and vice versa. Regarding the consequences of electoral violence, MAVERICK goes beyond death counts, and also provides injury estimates and assessments of displacement and property damage, which can aid a more nuanced investigation of electoral violence severity.

# Definitions {#sec-definitions}

## Electoral violence

Our conceptual starting point builds on existing research and defines electoral violence as physical violence that is "substantially linked to an electoral contest" [@fjeldehoglund2022, 166]. This definition sets electoral violence apart from other forms of political violence through its direct connection to electoral dynamics, including party primaries, voter registration procedures, campaigning, voting, counting, and contests concerning the election outcome. Electoral violence, in other words, is political violence that would (most likely) not have occurred absent an electoral contest, or would have manifested differently [@fjeldehoglund2022, 166].

All events included in MAVERICK had to fulfill three inclusion criteria. First, the event had to be delimited in time and space, meaning that at least one of the sources described the violence as taking place in a specific location and during a limited time period (even if the location and timing was missing from the report). Second, the event had to involve *physical* violence. Electoral violence comes in many forms. Some scholars adopt broad definitions of violence that include physical violence, coercion and violent posturing, as well as psychological violence and election fraud [@birchmuchlinski2017; @burchard2015; @bjarnegard2018]. Other scholars use narrower definitions and limit their empirical focus to lethal violence [@fjeldehoglund2022, 167]. MAVERICK occupies the middle-ground and restrict the data to acts of lethal or non-lethal violence against people or property. We conceive of violence as the use of physical force that results in (or has a high likelihood of resulting in) harm or injury to individuals or damage to property. This definition encompasses actions such as beatings, stabbings, vandalism, arson, shootings, tear-gassing, bombings, abductions, rape, and forced circumcision. Including also non-lethal forms of electoral violence is important because low-scale violence is simultaneously the most effective and least costly form of electoral manipulation [@wahman2024].

Third, the event had to be substantially linked to an electoral contest, that is, to "a formal contest to fill political offices where the public is involved in casting the vote" [@fjeldehoglund2022, 166]. Electoral contests recorded in MAVERICK include elections for both legislative and executive branches of national, regional, or local government, referendums, as well as party primaries. Inferring the link between a violent event and electoral politics is difficult and ultimately a qualitative context-dependent decision. To make these decisions transparent, coders had to report at least one of the following inferential clues for each event: (1) one of the actors involved had explicit ties to a political party or was referred to by its party affiliation; (2) one of the targets was election-related; (3) the reported purpose of the violence was to influence an electoral process or outcome; (4) the event was part of an episode of electoral violence or occurred as a reaction to an earlier electoral violence event; (5) a source explicitly identified the event as election-related; or (6) the event took place no more than six months prior to or after an election. To allow users to sort events based on the inferential clues, each event observation records which of the inferential clues the event fulfilled (`certain1:certain6`) and a cumulative count of the number of unique clues found for each event (`certain`).

Although most events clearly included (or did not include) the different inferential clues, the reporting sometimes made us uncertain as to whether the specific clue applied. We handled such grey-zone cases using two overarching principles. First, we discussed the event in our weekly coding meetings and used our pooled case expertise to assess whether particular formulations were sufficient evidence or not of a particular clue. For example, some events demanded that we interpret whether vague formulations such as "youth" genuinely referred to local youth, or to the organized youth militia known as the *Jeunes Patriotes* (Young Patriots) (inferential clue 1). Another common discussion was whether the event could be seen as being part of an episode of electoral violence (inferential clue 5). Again, here we often drew on case expertise to assess whether other semantic clues were similar to how electoral violence is typically described in the country. Second, whenever we remained uncertain after discussing the event in a coding meeting, we applied a conservative approach and coded the inferential clue as not present. Whenever the event did not fulfill any other inferential clues, this decision implied that we excluded the event altogether, whereas most often this approach meant that the event simply fulfilled fewer inferential clues.

## Actor involvement and roles

We define *actors* as groups or individuals that are directly involved in initiating, perpetrating, intervening in, or allowing electoral violence. In contrast to other electoral violence datasets, we do not constrain the data structure to the incumbent-opposition dichotomy or to only those actors that perpetrate violence for electoral purposes during the event. Certain actors that use violence during election-related incidents do so in conjunction with intervening in the violence, as is the case when the police or UN peacekeepers intervene to put a halt to electoral clashes. Moreover, electoral violence is sometimes enabled by the tacit acceptance of passive security forces.

We therefore seek to capture all actors involved in electoral violence during each event. Thus, each event includes a battery of actor-related variables for up to six actors per event (`actor1`; `actor2`; `actor3`; ...). We rely on this data structure because we believe that it is better suited to capture actors that are not directly associated with either the incumbent or the opposition. Moreover, we believe that this approach is better able to capture variation in the ways in which a certain actor is involved in a particular electoral violence incident.

The actors recorded for a specific event in MAVERICK can be assigned five different roles: initiator, perpetrator, intervener, passive bystander, and victim. Some of these roles are mutually exclusive (an actor cannot be a passive bystander *and* a perpetrator), whereas other roles are additive (an actor can simultaneously be an initiator, perpetrator, and victim). Different sources may differ in what actor they assign what roles. For each event, we therefore process the event reports so that 1) there is *at most* one initiator per event, 2) interveners are *never* initiators, 3) passive bystanders are *never* initiators, perpetrators, or interveners, and 4) victims only include actors that were involved in the violence. In cases where a particular actor was reportedly involved in the violence, but where there was no further information on what role the actor played, are not assigned any role but nevertheless listed as involved. Actors that are *not* involved in the violence but are victimized are coded as targets (see subsubsection @sec-targets).

::: callout-note
## Actor role assignment rules

As discussed above, the actor role variables are partially dependent on one another, for example, because interveners cannot simultaneously be initiators. While the event report level data may therefore contain contradictory information on actor roles when assessed together, we process the event report data so that the actor role assignments are logically consistent using the following rules:

Initiator

:   If different sources identify different initiators for a single event, the actor identified as the initiator by the largest number of sources is coded as the initiator. If multiple actors are identified as the initiator by an equal number of sources, no actor is coded as the initiator.

Perpetrator

:   The actor is coded as a perpetrator if at least one source identifies the actor as a perpetrator of violence, unless at least one source identifies the actor as an initiator or passive bystander.

Intervener

:   The actor is coded as an intervener if at least one source identifies the actor as an intervener in the violence, unless at least one source identifies the actor as an initiator.

Passive bystander

:   The actor is coded as a passive bystander if at least one source identifies the actor as a passive bystander to the violence and the actor is identifies as a security force, unless at least one source identifies the actor as an initiator, perpetrator, or intervener.

Victim

:   The actor is coded as a victim if at least one source identifies the actor as a victim of violence and at least one source identifies the actor as involved in the violence.
:::

### Initiators

We define the *initiator* of electoral violence as a group or individual that is the first to resort to physical violence during an incident. The initiator of electoral violence is always a perpetrator, but need not be the same as the instigator of violence. A politician can, for example, pay local youth to attack opposition voters, making the politician the instigator and the local youths the initiators. Moreover the initiator of violence is different from the initiator of the event [@daxeckeretal2019b, 15]. For example, in the case when opposition protesters stage a peaceful demonstration but are attacked by riot police, the protesters are the initiators of the event (the protest), and the riot police are the initiators of the violence (the attack). In the event that the protesters fight back against the riot police, the police would still be the initiator, whereas if the riot police's attack was preceded by the protesters throwing rocks against the police, the protesters would be the initiator. That is, the initiator of violence is distinguished primarily by *when* they perpetrate violence. This definition further implies that there can only by one initiator per event, but multiple instigators and perpetrators.

### Perpetrators

We define a *perpetrator* of electoral violence as a group or individual that is directly involved in using physical violence during an incident of electoral violence. Unlike the instigator, the perpetrator must be present at the scene of violence, with the exception of actors that use long-range (such as artillery) or remote-controlled (such as IEDs) weapons to commit violence. Examples of acts characteristic of the perpetration of electoral violence are shooting at protesters, throwing rocks at the police, bombing or burning down polling stations, raping voters, violently enforcing roadblocks to disrupt polling, and using violent crowd-control tactics (such as tear-gassing) to crack-down on protesters.

### Interveners

We define an *intervener* in electoral violence as a group or individual that employs violence or force to end or regulate violence between two or more actors. Interveners differ from perpetrators in the sense that their use of violence is not aimed at affecting an electoral contest per se, but instead seeks to prevent or halt such violence. Examples of typical interveners are the police, army, and UN peacekeepers. For example, in the case that the police uses violence to disperse a violent scuffle between rivaling party supporters, the police are interveners.

The fictional example below illustrates these distinctions. According to our definitions, Actor 1 is not considered as involved in the violence, as there is no evidence that the political candidate was present during the violent incident itself or engaged in any violence. Actor 2 is both the *initiator* and a *perpetrator*: the thugs were the first to use violence when they attacked polling stations. Actor 3 is only a *perpetrator*: the local gang committed violence, but there is no evidence that it was the first actor to use violence. Actor 4 is an *intervener* and *perpetrator*: the police intervened to restore order, but also used force that resulted in several people being injured.\

::: callout-note
## Fictional example of actor roles

On the day of the elections, electoral violence erupted in the town of Oakville, leaving several injured and properties damaged. Actor 1, a political candidate, encouraged his supporters to use force to intimidate opponents and disrupt the electoral process. Actor 2, a group of thugs, triggered the violence by attacking polling stations and destroying ballot boxes. Actor 3, a local gang, retaliated against the thugs, injuring three, causing chaos and fear among the residents. Despite the efforts of police (Actor 4) to restore order, the incident resulted in a significant loss of trust in the electoral process and raised concerns about the safety and security of citizens during future elections. At the same time, international observers criticized the gendarmerie (Actor 5) for allowing the violence to spiral out of control without intervening. At least seven people were injured when police fired rubber bullets into the chaos outside one polling station.
:::

### Passive bystanders

In addition to the three actor roles described above, we also code whether the security forces *failed to intervene* in the violence. We code this role as *passive bystander*. The logic underlying this coding is that security forces are expected to intervene to end violence and protect citizens. Failure to do so can, in other words, constitute a form of violence in the sense that the security forces enable or even encourage violence by refraining from intervening. This role assignment is reserved for *security force* actors that are *present* at the event location. In the fictional example above, the gendarmerie (Actor 5) is a passive bystander.

### Victims

Finally, we define a *victim* of violence as a group or individual that is directly involved in initiating, perpetrating, intervening in or allowing electoral violence *and* who is also harmed, injured, or killed as a result of being the target of violence. In the example above, Actor 2 (the group of thugs) is a victim of violence committed by Actor 3 (the local gang). Note that the unnamed people injured are not included as actors, and only included in the total count of the number of injured people.

# Coding rules and criteria {#sec-coding-rules-and-criteria}

## Unit of analysis

The unit of analysis in the MAVERICK base dataset is the *event report*. An event report is an observation of an electoral violence incident (as defined above) reported in a single unique source [@cookweidmann2019; @weidmanngeelmuydenrod2015]. The examples below illustrate the difference between event reports, events, and sources across five different scenarios. Event reports differ from events in the sense that there can be multiple event reports about a single event (Scenario 1). Likewise, event reports are not synonymous with sources, as a single source (such as a long news report or human rights report) can make reference to multiple different events (Scenario 2). In order to qualify as a new event report, the report must be independent of previously coded reports about the same event. News agencies such as AFP and Reuters, for example, often publish multiple reports about the same news story, wherefore we only record the most detailed published version of the day (Scenario 3).[^1] Likewise, some news outlets (notably those captured through AllAfrica in Factiva) simply republish news from news agencies like AFP, Reuters, and AP. These republished news items are *not* considered event reports and thereby not included in the dataset. In addition, a new event report must focus on *news events* rather than so called *reference events*—events that are not the focus of the report but only used to contextualize the situation (Scenario 5).

[^1]: We make an exception for multiple reports from the same news agency that constitute different types of articles, such as when one article reports about a specific event and another article reports about violence in the country in general.\

::: callout-note
## Exemplifying event reports, events, and sources

Scenario 1

:   A news report by AFP and a news report by Reuters both report that three people were killed in Nairobi as they were lining up to vote.

    = 2 event reports (1 event and 2 sources)

Scenario 2

:   A report by HRW contains information on electoral violence incidents on election day that took place in ten different cities.

    = 10 event reports (10 events and 1 source)

Scenario 3

:   Five versions of an AFP report updated throughout the day report that a militia group attacked a municipal councilor while he was campaigning.

    = 1 event report (1 event and 1 source)

Scenario 4

:   An election monitoring report lists seven separate attacks on polling stations, of which one attack is also covered in a BBC news report.

    = 8 event reports (7 events and 2 sources)

Scenario 5

:   An AP news story reports that election day passed without any violent incidents, a surprising outcome given that there were five violent incidents in the two weeks before the election.

    = 0 event reports (0 events and 1 source)
:::

Coding event reports rather than events has several benefits [see e.g. @cookweidmann2019; @weidmanngeelmuydenrod2015]. First, this coding procedure makes the information extraction step more transparent and helps preserve the raw data contained in the source material. Second, as the aggregation of multiple event reports into single events implies making decisions about report credibility and contradictory information, this coding procedure makes the aggregation process much more transparent, flexible, and reproducible. Third, by automating the aggregation process, the coding procedure allows users to replicate their analyses using different aggregation models and to override the MAVERICK aggregation rules and develop their own procedures. Fourth, by preserving the raw event reports, our data structure allows us to also use the data to investigate reporting biases and different approaches to improving data quality.

Event reports that relate to the same event are linked to one another through a unique [event ID](#subsubsec-event-id). Event IDs were assigned manually by the coders when recording the event reports. Manual assignment, rather than automatic assignment based on event timing and location, was necessary because there were often multiple separate incidents of electoral violence in the same location on the same day, such as in different parts of a city. Event IDs were regularly double-checked for errors to ensure that all events reports that plausibly describe the same event have the same event ID, and vice versa.

## Matching event report to events

Matching multiple event reports to a single event record is a central but inherently subjective step in compiling event data. In our coding process, we assigned a common event ID to reports that described what we judged to be the same underlying incident. This judgment was based on three main criteria: (1) *temporal proximity*—typically the same or adjacent days; (2) *spatial proximity*—generally the same city, town, or village; and (3) *narrative coherence*—reports that described similar sequences of events or actors, even if they differed in specific details. In practice, certain dimensions (such as the number of people killed or the identity of perpetrators) varied more widely across sources, while others (such as date and location) tended to be more consistent and were thus weighted more heavily in the matching process. We did not require perfect consistency, but we did require that reports agree on a core set of identifying features. When in doubt, we opted to treat reports as distinct events unless there was a clear basis for clustering. While this approach inevitably involves human judgment, we aim to reduce its arbitrariness by documenting the structure of each report and retaining the event report level data in the published dataset, enabling future users to inspect, revise, or re-cluster the matching if desired.

## Sampling procedure

The dataset builds on public reports written in English and French (for Côte d'Ivoire). To create a comprehensive and representative dataset of electoral violence in Côte d'Ivoire and Kenya, we identified these reports through two different sampling procedures.

The first step—the core of the coding—consisted of identifying potential events through Factiva, a multi-source inventory that contains news reports from a wide range of international and domestic sources. Factiva includes both major international news outlets such as the British Broadcasting Corporation (BBC), Reuters, Agence France-Presse (AFP), and Associated Press, and domestic news reports re-published through platforms such as AllAfrica. We considered Factiva an appropriate database for the sampling because it constitutes the backbone of many existing political violence event datasets, including the Uppsala Conflict Data Program [@daviesetal2023].

Our search string, adapted from the baseline ECAV search string [@daxeckeretal2019], focused on common synonyms and key words for violence in English and French (for Côte d'Ivoire). We piloted the search string to find a good balance between capturing as many relevant event reports as possible while also keeping the results to a manageable number of reports. Beyond setting the relevant date range and region, we also restricted the search in a number of other ways. First, we limited the search to the headline and lead paragraph of the reports. Second, we excluded republished news, recurring pricing and market data, obituaries, sports, and calendars. Finally, we restricted the search to a number of predefined sources that included relevant international news agencies (e.g. AFP, Reuters, Xinhua), international news media (e.g. The Guardian, The New Humanitarian, BBC, CNN), and domestic news media identified through the AllAfrica republishing repository.

::: callout-note
## Factiva search strings

Factiva search string (Kenya)

:   election\* AND (riot OR violen\* OR attack OR kill\* OR intimidation\* OR injur\* OR clash OR death OR dead OR assass\* OR protest\*)

Factiva search string (Côte d'Ivoire)

:   election\* AND (riot OR violen\* OR attack OR kill\* OR intimidation\* OR injur\* OR clash OR death OR dead OR assass\* OR protest\* OR emeute OR attaq\* OR meurtre OR mort\* OR tué\* OR tue\* OR blesse\* OR decede\* OR deces OR heurts OR manif\*)
:::

![Full Factiva search specification](factiva.png){fig-align="center"}

The second step focused on coding a number of pre-selected secondary sources, primarily reports produced by reliable international and domestic human rights watchdogs, such as Human Rights Watch and Amnesty International. In addition, we coded election monitoring reports compiled by international election observation missions such as The Carter Center and the EU's monitoring missions. Including non-news sources in event data coding is important to combat source biases produced by the news production cycle and interests [@dietricheck2020].

```{r}
#| echo: false
#| warning: false
#| error: false

data <- maverick %>%
  mutate(
    sampling = case_when(
      sampling == 0 ~ "Factiva",
      sampling == 1 ~ "Pre-selected secondary source",
      sampling == 2 ~ "Coder-identified secondary source",
      TRUE ~ as.character(sampling)
    )
  ) %>%
  count(sampling) %>%
  mutate(
    percentage = scales::percent((n / sum(n)), accuracy = 0.01)
  ) %>% 
  rename(
    `Sampling strategy` = sampling,
    `Number of event reports` = n,
    `Share of event reports` = percentage
  )

kable_output <- data %>%
  kable(
    format = "html", table.attr = "class='table'", 
    caption = "Number and share of events by sampling strategy", 
    label = "tab:sampling",
    align = c("l", "r", "r")
  )

kable_output
```

## The coding process

All coding was carried out by research assistants that worked full-time during their time in the project. All coders were trained to conduct the coding by the project leaders and continuously supervised by the project leaders. The coders manually read through the Factiva results and other secondary sources to identify electoral violence events. Once coders identified a report about an electoral violence event, the event was registered as an event report through the [MAVERICK online coding platform](#sec-the-coding-platform). We instructed the coders to input information into the coding platform using as little descriptive inference as possible, that is, without interpreting the information in the reports or deviating from what was reported in the source. In addition, to ensure that event reports were recorded independent from one another, we instructed the coders to refrain from using other event reports about the same event to code a specific event report or impute missing values.[^2] The logic behind this decision was that all discrepancies across sources concerning the same event should be recorded in the event report dataset and resolved systematically in the aggregation phase. The progress of the coding, difficult cases, or interpretations of the codebook were discussed in weekly coding meetings led by the project leaders.

[^2]: The only exception concerned instances where one event report recorded casualties at the city level, and several other event reports recorded casualties at multiple sub-city locations. In such situations, event reports at more precise locations were sometimes aggregated into a single event report at the city level to ensure that casualties could not be double-reported.

While compiling event reports, we took steps to reduce redundancy and minimize dependence between sources. We removed exact or near-duplicate reports (e.g., syndicated articles or republications) and excluded reports that explicitly cited another source as their sole basis for describing the event. However, full independence across sources cannot be guaranteed, as secondary sourcing is not always disclosed. Users should therefore interpret agreement across sources—particularly in the most-representative aggregation model—as reflecting convergence in available reporting, not necessarily independent corroboration.

## The MAVERICK online coding platform {#sec-the-coding-platform}

All event reports were registered in the MAVERICK Data Hub, an online coding platform that allowed the coders to use a Shiny form to input, edit, and delete data. @fig-data-hub shows the coding interface. The coding platform helped make the coding easier and faster, and ensured that all dataset rows followed a consisted format without random errors or encoding errors. The coding platform also forced the coders to abide by the codebook, as all fields were accompanied by key operationalizations and explanatory notes from the codebook and most fields only accepted pre-defined values. While we still made some changes to the codebook in the early stage of the data collection, the coding platform helped ensure that all changes were approved by the project leaders, and enabled the project leaders to ensure that all coded events reports were updated in line with any revised coding rules. Another central function of the coding platform was the flag function, which allowed the coders to flag all event reports that they were uncertain about how to code and therefore wanted to discuss in the weekly coding meetings.

![The Maverick online coding platform interface](coding_platform.png){#fig-data-hub}

## Aggregating event reports into events

MAVERICK comes in four editions: an event report level dataset and three event level datasets aggregated using different aggregation models. In addition, we provide the `eventreport` `R` package, which allows users to make their own custom aggregation rules and aggregate the data in ways that best fit their purposes.

### The most-representative aggregation set

Our first aggregation model (`maverick_event_representative`) is the simplest and aggregates event reports to the event level by primarily using the most frequent value, here refereed to as the most-representative aggregation model. This model most closely reflects the standard idea of triangulation, which stipulates that information reported by many independent sources is more trustworthy than information reported by a few sources [@leuffenetal2013, 43]. In practice, this aggregation model uses functions to identify the mode value of each variable across the event reports, including empty strings. Treating empty strings as information is essential for capturing uncertainty, for example, because the representative answer to who perpetrated a certain attack is that we do not know. A key strength of this aggregation model is its simplicity and ability to best reflect the collected data. The key limitations are that it does not take into account the differential quality of the underlying sources, more complex ways of dealing with discrepancies, or a preference for retaining as much information as possible.

### The most-informative aggregation set

Our second model (`maverick_event_informative`) aggregates event reports to the event level using a procedure that aims to maximize the amount of information from the event reports, here referred to as the most-informative aggregation model. The logic that underpins this model is to provide the user with as detailed and nuanced information as possible [@sundbergharbom2011, 101], a prerequisite for testing micro- and meso-level theories on political violence [@balcellsjustino2014, 1348]. Moreover, this model may be particularly suitable for testing theories that are not necessarily concerned with the "ground truth" (what really happened), and instead concerned with the consequences of how an event is perceived. Examples include studies that focus on how perceptions of electoral violence, and not what actually happened, shape voter turnout and vote choice [@vonborzyskowskietal2022; @rosenzweig2023]. In some instances, sources with greater detail are also more likely to be more accurate, as specificity may indicate that the author was more proximate to and had access to more information about the event [@dulic2011, 41]. At the same time, the quest for greater detail also risks undermining the reliability of the data, as more specific information also increases the range of possible measurement error [@sundbergharbom2011, 101].

To make the aggregated data as informative as possible, the aggregation model makes use of specificity score variables to prioritize more specific event reports over less specific event reports. Specificity can mean different things for different variables. For categorical variables, such as the actors involved in the violence, we consider an event report to be more specific when it identifies actors at a more disaggregated level (e.g. at the sub-type level rather than type level). Likewise, we consider an event report to be more specific about the involved actor when it identifies the actor by name, than when it only identifies the type of actor. For dichotomous and numerical variables, such as the number of people killed in an event or whether the event resulted in property damage or not, we consider an event report more specific when it reports higher or positive values than when it reports lower or negative (zero or false) values. The rationale behind this decision is the assumption that journalists and other violence observers are more inclined to report something they know than something they do not know, suggesting that higher and positive values more often reflect the true outcome than lower and negative values. Finally, for range variables, the most-informative model seeks to generate the smallest possible range (e.g. the highest reported event start date and the lowest reported event end date).

### The most-conservative aggregation set

Our final model (`maverick_event_conservative`) aggregates event reports to the event level using a procedure that aims to prioritize information with the the lowest common denominator across the event reports to avoid random and unsystematic measurement error [@sundbergharbom2011, 99, @weidmann2013, 575]. That is, the most-conservative model seeks to retain information consistent with as many of the event reports as possible. Taking into account information from event reports that are outliers may improve data accuracy, but also risks introducing biased, incorrect, or inflated information into the data. For this reason, the most-conservative aggregation model prioritizes those values that are consistent with as many event reports as possible. This is different from the mode value and in practice means that the most-conservative aggregation model will use the specificity score variables to prioritize 1) categorical values at a higher level of aggregation (e.g. towns over neighborhoods for the location variable), 2) minimum values for numerical and dichotomous variable (e.g. the lowest reported number of deaths), and 3) the largest possible range for range values (e.g. the lowest reported event start date and the highest reported event end date).

### Tie-breaking rules

All three aggregation models use two tie-breaking rules to adjudicate between multiple mode values. Tie-breaking rules are necessary because events can be multi-modal and result in a tie. This problem is pertinent in the MAVERICK data, as it includes many categorical variables and most multi-source events are based on two sources. These tie-breaking rules differ for categorical, dichotomous, and numerical variables. Events for which no single value could be determined after the tie-breaking rules are automatically coded as `"Indeterminate"`. The individual variable descriptions below outline the specific tie-breaking rules used for each variable.

# Variables in MAVERICK {#sec-event-report-dataset}

Our descriptions of the variables in MAVERICK should be read with the four different dataset versions in mind. Unless indicated otherwise, the *Description* field refers to the coding rules for the event report level dataset. For variables that are also included in the three event level datasets, the *Most-representative aggregation*, *Most-informative aggregation*, and *Most-conservative* fields refer to the aggregation rules used to combine multiple event reports into a single event coding.

## Election variables

### Country (`country`)

*Description:* Name of the country in which the election and event took place.

*Values:* A character string.

```{r}
#| echo: false
#| fig.align: center
#| fig-cap: "Number of event reports by country"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = country, fill = country)) +
  scale_fill_manual(values = wes_palette("Zissou1", 2, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

### Event identifier (`event_id`) {#subsubsec-event-id}

*Description:* A unique event identifier. Constructed as \[Country code\]-\[4-digit event number\]. For example: `KE-0001` for events in Kenya, `CIV-0003` for events in Côte d'Ivoire. This event ID is used to aggregate event reports into events. Event reports that concern the same event use the same event ID.

*Values:* A character string.

### Election (`election`)

*Description:* Name of the election to which the event was most closely associated. Events that were plausibly related to multiple concurrent elections (a common occurrence in Kenya) or multiple rounds of the same election (common in Côte d'Ivoire) were assigned to a generic election-year string.

*Values:* A character string.

*Most-representative aggregation:* The mode reported election, including empty strings (`""`). For multi-modal events, the election is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The mode reported election, excluding empty strings (`""`). For multi-modal events, the election is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The mode reported election, including empty strings (`""`). For multi-modal events, the election is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

```{r}
#| echo: false
#| fig.align: center
#| fig.height: 5
#| fig.cap: "Number of event reports by election"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = election, fill = election)) +
  scale_fill_manual(values = wes_palette("Zissou1", 26, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme +
  theme(axis.text.y = element_text(size = 7))
```

### Number of inferential clues for election link (`certain`) {#subsubsec-certain}

*Description:* A count variable that indicates the number of inferential clues that the event was election-related that the event report mentions. Generated automatically as the sum of `certain1`, `certain2`, `certain3`, `certain4`, `certain5`, and `certain6`.

*Values:* Numerical.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports by number of inferential clues"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = as.factor(certain), fill = as.factor(certain))) +
  scale_fill_manual(values = wes_palette("Zissou1", 6, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

### Inferential clue: Reporting (`certain1`) {#subsubsec-certain1}

*Description:* A dichotomous variable that indicates whether the reported event was inferred to be election-related because the event report or another event report explicitly identified the event as election-related.

*Values:* 0 = No, 1 = Yes.

### Inferential clue: Actor (`certain2`)

*Description:* A dichotomous variable that indicates whether the reported event was inferred to be election-related because at least one of the actors involved had explicit ties to a political party or was referred to by their party affiliation.

*Values:* 0 = No, 1 = Yes.

### Inferential clue: Target (`certain3`)

*Description:* A dichotomous variable that indicates whether the reported event was inferred to be election-related because at least one of the targets was election-related, such as voters at a polling station, political candidates, election observers, security forces deployed to overlook the election, electoral material, or electoral infrastructure.

*Values:* 0 = No, 1 = Yes.

### Inferential clue: Purpose (`certain4`)

*Description:* A dichotomous variable that indicates whether the reported event was inferred to be election-related because the reported purpose of the event was to influence an electoral process or outcome. Purpose was mainly inferred from statements issued by the perpetrator, or by the context and the reported alleged intent.

*Values:* 0 = No, 1 = Yes.

### Inferential clue: Episode (`certain5`)

*Description:* A dichotomous variable that indicates whether the reported event was inferred to be election-related because the event was part of an episode of electoral violence or occurred as a reaction to an earlier electoral violence event.

*Values:* 0 = No, 1 = Yes.

### Inferential clue: Timing (`certain6`)

*Description:* A dichotomous variable that indicates whether the reported event was inferred to be election-related because it occurred at most 6 months prior to or after an election.

*Values:* 0 = No, 1 = Yes.

## Time-related variables

### Event starting date (`date_start`)

*Description:* The earliest possible date the event occurred expressed in YYYY-MM-DD format. If the report contained no time clues, this date is coded as the day two weeks prior to the publication date.

*Values:* Dates.

*Most-representative aggregation:* The mode reported start date. For multi-modal events, the start date is rounded to the mean of the modes, rounded to the earliest date to ensure whole dates.

*Most-informative aggregation:* Latest possible reported start date.

*Most-conservative aggregation:* Earliest possible reported start date.

### Event ending date (`date_end`)

*Description:* The last possible date the event occurred expressed in YYYY-MM-DD format. If the report contained no time clues, this date is the publication date.

*Values:* Dates.

*Most-representative aggregation:* The mode reported end date. For multi-modal events, the end date is rounded to the mean of the modes, rounded to the earliest date to ensure whole dates.

*Most-informative aggregation:* Latest possible reported end date.

*Most-conservative aggregation:* Earliest possible reported end date.

### Vague date indicator (`date_vague`)

*Description:* A dichotomous variable that indicates whether the event report lacked time clues and was coded based on publication date. Only available in the event report level dataset.

*Values:* 0 = No, 1 = Yes, coded based on publication date.

## Geographic variables

### City event location (`city`)

*Description:* The name of the city or village in which the violence took place. All city and village names were retrieved from [https://www.geonames.org](https://www.geonames.org/). For event reports with less specific location clues, we sometimes inferred the city through additional research or case knowledge. For example, events taking place at "party headquarters" were coded based on additional research on the exact location of those headquarters, even if no city was explicitly mentioned in the report.

*Values:* A character string, including `Indeterminate`.

*Most-representative aggregation:* The mode reported city, including empty strings (`""`). For multi-modal events, the city is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The most precise reported city, that is, the city reported in the report with the highest `geo_precision` value. For multi-modal events, the city is coded as the most precise reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The least precise reported city, that is, the city reported in the report with the lowest `geo_precision` value. For multi-modal events, the city is coded as the most precise reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

### Precise event location (`location`)

*Description:* A text description of the most precise event location described in the report.

*Values:* A character string, including `Indeterminate`.

*Most-representative aggregation:* The mode reported location, including empty strings (`""`). For multi-modal events, the city is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The most precise reported location, that is, the location reported in the report with the highest `geo_precision` value. For multi-modal events, the location is coded as the most precise reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The least precise reported location, that is, the location reported in the report with the lowest `geo_precision` value. For multi-modal events, the location is coded as the most precise reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

### Location latitude (`latitude`)

*Description:* The latitude for the point indicated in `location`. Coordinates were retrieved from [https://www.geonames.org](https://www.geonames.org%7D), <https://www.google.com/maps>, or [https://www.openstreetmap.org](https://www.openstreetmap.org/).

*Values:* An integer, including `NA_integer`.

### Location longitude (`longitude`)

*Description:* The longitude for the point indicated in `location`. Coordinates were retrieved from [https://www.geonames.org](https://www.geonames.org%7D), <https://www.google.com/maps>, or [https://www.openstreetmap.org](https://www.openstreetmap.org/).

*Values:* An integer, including `NA_integer`.

### Geo-precision indicator (`geo_precision`)

*Description:* A categorical variable that denotes how precisely the geo-coordinates are coded.

*Values:* 1 = Country, 2 = Region (e.g. district, region, department), 3 = City, 4 = Sub-city admin areas (e.g. commune), 5 = Neighborhood or village, 6 = Exact location (street, building, square).

*Most-representative aggregation:* The geo-precision value corresponding to the mode reported location.

*Most-informative aggregation:* The highest geo-precision value.

*Most-conservative aggregation:* The lowest geo-precision value.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports by geo-precision score"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = as.factor(geo_precision), fill = as.factor(geo_precision))) +
  scale_fill_manual(values = wes_palette("Zissou1", 6, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

## Actor-related variables

The MAVERICK dataset includes information for up to six actors per event (the maximum number of actors identified in any event). Actor-related variables therefore come in sets of six, identified by the `actor1, actor2, actor3, actor4, actor5, actor6` prefixes. For simplicity, in this Codebook, we refer to the actor sets using an asterisk (i.e. `actor*`) The order is not significant, and actors can be recorded in any of the six actor sets, even when earlier actor sets are empty. That is, for an event report that records only two actors, those two actors can still be recorded using the `actor3:actor6` prefixes. The reason for this data structure is that the columns are used to identify different event report references to *the same actor* for event reports pertaining to the same event. For example, if two event reports about the same event record the involvement of the gendarmerie, these actor records will both be recorded in the same column.

 

::: callout-caution
## Unknown actors vs. no recorded actors

The MAVERICK dataset sometimes identifies the *number of involved actors* even when there is little or no further information about those actors. Empty strings (`""`) in a particular actor variable set indicate that an actor was present but that little or no information was reported about the involved actor. In contrast, events can record less than six (but not less than one) actors. For those cases, all variables in the actor variable set are coded as NA (`NA_character_`).
:::

### Actor name (`actor*`)

*Description:* Records the name of the actor involved in the event. Names were initially coded as reported, and then harmonized in the post-coding [data cleaning](#sec-data-cleaning). Whenever the `actor*` variable contain an empty string (`""`), it implies that `actor*` was reportedly involved in the event, but that no identifying information was reported for that actor. In contrast, whenever the `actor*` variable is coded as `NA_character_`, it implies that no `actor*` was reportedly involved in the event.

*Values:* A character string, including `Indeterminate`.

*Most-representative aggregation:* The mode reported actor name, including empty strings (`""`). For multi-modal events, the actor name is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The most specific reported actor name, that is, the actor name reported in the report with the highest `actor*_precision` value. For multi-modal events, the actor name is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The least specific reported actor name, that is, the actor name reported in the report with the lowest `actor*_precision` value. For multi-modal events, the actor name is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

### Actor identifier (`actor*_id`)

*Description:* A unique actor identifier corresponding to the actor name recorded in `actor*`. Constructed as \[Country code\]-\[unique three digit id\]. E.g. CIV-001 for actors in Côte d'Ivoire and KE-001 for actors in Kenya.

*Values:* A character string, including `Indeterminate`.

### Actor type (`actor*_type`)

*Description:* A categorical variable that indicates the type of actor involved in the event.

*Values:*

— NA (when no `actor*` is recorded)

— Empty string (when `actor*` is recorded but actor type is unknown)

— Security forces

— Other state actor

— Non-state armed actor

— Political actor

— Citizens

— Other

*Most-representative aggregation:* The mode reported actor type, including empty strings (`""`). For multi-modal events, the actor type is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The most specific reported actor type, that is, the actor type reported in the report with the highest `actor*_precision` value. For multi-modal events, the actor type is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The least specific reported actor type, that is, the actor type reported in the report with the lowest `actor*_precision` value. For multi-modal events, the actor type is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports by actor type"

maverick %>% 
  select(
    event_id, actor1_type, actor2_type, actor3_type,
    actor4_type, actor5_type, actor6_type
  ) %>% 
  pivot_longer(
    cols = c(-event_id),
    names_to = "actor",
    values_to = "actor_type"
  ) %>% 
  filter(!is.na(actor_type)) %>% 
  ggplot() +
  geom_bar(aes(x = actor_type, fill = actor_type)) +
  scale_fill_manual(values = wes_palette("Zissou1", 7, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of actor records"
  ) +
  guides(fill = "none") +
  my_theme
```

### Actor subtype (`actor*_subtype`)

*Description:* A categorical variable that indicates the subtype of actor involved in the event.

*Values:*

— NA (when no `actor*` is recorded)

— Empty string (when `actor*` is recorded but actor subtype is unknown)

— Security forces: Army

— Security forces: Paramilitary police

— Security forces: Police

— Security forces: Presidential guard

— Non-state armed actor: Militia

— Non-state armed actor: Rebel group

— Non-state armed actor: Criminal gang

— Non-state armed actor: Vigilante group

— Political actor: Politician(s)

— Political actor: Political party

— Political actor: Youth wing

— Political actor: Women's wing

— Political actor: Candidate(s)

— Political actor: Party supporters

— Political actor: Protesters

— Political actor: Student union

— Citizens: Voters

— Citizens: Residents

— Citizens: Outsiders

— Citizens: Youths

— Other

*Most-representative aggregation:* The mode reported actor subtype, including empty strings (`""`). For multi-modal events, the actor subtype is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The most specific reported actor subtype, that is, the actor subtype reported in the report with the highest `actor*_precision` value. For multi-modal events, the actor subtype is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The least specific reported actor subtype, that is, the actor subtype reported in the report with the lowest `actor*_precision` value. For multi-modal events, the actor subtype is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports by actor subtype"

maverick %>% 
  select(
    event_id, actor1_subtype, actor2_subtype, actor3_subtype,
    actor4_subtype, actor5_subtype, actor6_subtype
  ) %>% 
  pivot_longer(
    cols = c(-event_id),
    names_to = "actor",
    values_to = "actor_subtype"
  ) %>% 
  filter(!is.na(actor_subtype)) %>% 
  ggplot() +
  geom_bar(aes(x = actor_subtype, fill = actor_subtype)) +
  scale_fill_manual(values = wes_palette("Zissou1", 18, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of actor records"
  ) +
  guides(fill = "none") +
  my_theme +
  theme(axis.text.y = element_text(size = 7))
```

### Actor party affiliation (`actor*_party`)

*Description:* A string variable that lists the acronym of the political party with which the actor was affiliated.

*Values:* A character string, including `Indeterminate`.

*Most-representative aggregation:* The mode reported actor party affiliation, including empty strings (`""`). For multi-modal events, the actor party affiliation is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The most specific reported actor party affiliation, that is, the actor party affiliation reported in the report with the highest `actor*_precision` value. For multi-modal events, the actor party affiliation is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The least specific reported actor party affiliation, that is, the actor party affiliation reported in the report with the lowest `actor*_precision` value. For multi-modal events, the actor party affiliation is coded as the most specific reported value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

### Actor role: Initiator (`actor*_initiator`)

*Description:* A dichotomous variable that indicates whether the actor initiated the violence.

*Values*: 0 = No, 1 = Yes, the actor initiated the violence.

*All aggregations:* The actor identified as the initiator by the largest number of sources. If multiple actors are identified as the initiator by an equal number of sources, no actor is coded as the initiator.

### Actor role: Perptrator (`actor*_perpetrator`)

*Description:* A dichotomous variable that indicates whether the actor perpetrated violence during the event.

*Values:* 0 = No, 1 = Yes, the actor perpetrated violence during the event.

*All aggregations:* Coded as a perpetrator whenever at least one source identifies the actor as a perpetrator, unless at least one source identifies the actor as a passive bystander.

### Actor role: Intervener (`actor*_intervener`)

*Description:* A dichotomous variable that indicates whether the actor intervened to stop the violence during the event.

*Values:* 0 = No, 1 = Yes, the actor intervened to stop the violence during the event.

*All aggregations:* Coded as an intervener whenever at least one source identifies the actor as an intervener, unless at least one source identifies the actor as an initiator or passive bystander.

### Actor role: Passive bystander (`actor*_bystander`)

*Description:* A dichotomous variable that indicates whether the actor was a passive bystander to the violence during the event. Only applicable to security forces that were present during the event.

*Values:* 0 = No, 1 = Yes, the actor was a passive bystander during the event.

*All aggregations:* Coded as a passive bystander if at least one source identifies the actor as a passive bystander and the actor identifies a security force, unless at least one source identifies the actor as an initiator, perpetrator, or intervener.

### Actor role: Victim (`actor*_victim`)

*Description:* A dichotomous variable that indicates whether the actor was also a victim of violence during the event. Only applicable to actors that were *also* involved in initiating, perpetrating, intervening in, or remaining passive about the violence. Actors that were victimized in the violence, but themselves not involved, are coded in the `target` variable.

*Values:* 0 = No, 1 = Yes, the actor was a victim of violence during the event.

*Most-representative aggregation:* Mode value. For multi-modal events, actor victim is coded as 1.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

### Actor violence repertoire (`actor*_violence`)

*Description:* A string variable that lists *all* forms of violence that the actor perpetrated during the event.

*Values:*

— Abduction (kidnapping, forced disappearance)

— Arson (burning, torching)

— Beating (confrontation, fighting, scuffle)

— Crowd control measures (rubber bullets, tear gas, water cannons)

— Looting (vandalism)

— Remote violence (bombing, suicide bombing, bomb attack, shelling, IED explosion)

— Riot (violent protest, violent uprising, mob violence, stone throwing)

— Sexual violence (rape, forced circumcision, sexual assault)

— Shooting (gun shots)

— Stabbing (use of machetes, use of spears)

— Torture (mutilation)

— Other (bow and arrow, poison)

*Most-representative aggregation:* All recorded unique values.

*Most-informative aggregation:* All recorded unique values.

*Most-conservative aggregation:* All recorded unique values.

### Actor precision indicator (`actor*_precision`)

*Description:* A categorical variable that denotes how precisely the actor was coded. Only available in the event report level dataset.

*Values:*

0 = Actor unknown

1 = Actor unknown except for a few attributes

2 = Actor identified by type

3 = Actor identified by subtype

4 = Named actor

## Violence-related variables

### Event context (`event_context`)

*Description:* A categorical variable that indicates the context in which the violence occurred.

*Values:*

— Arrest or raid

— Canvassing

— Clash

— Incumbent campaign event

— Military operations (combat, patrols)

— Opposition campaign event

— Protest

— Riot

— Security force crack-down on protest

— Vote counting

— Voting

— Other context

— Unknown

*Most-representative aggregation:* The mode reported event context, including empty strings (`""`). For multi-modal events, the event context is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The mode reported event context, excluding empty strings (`""`). For multi-modal events, the event context is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The mode reported event context, including empty strings (`""`). For multi-modal events, the event context is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports by event context"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = as.factor(event_context), fill = as.factor(event_context))) +
  scale_fill_manual(values = wes_palette("Zissou1", 14, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

### Primary target of violence (`target`) {#sec-targets}

Description: A categorical variable that lists the reported primary target of violence.

*Values:*

— Unknown

— Campaigners/canvassers

— Citizens

— Election workers or personnel

— Election-related activists or protesters

— Electoral infrastructure or material

— Journalists

— Non-state armed group members

— Other

— Other ethnic group

— Party offices

— Peacekeepers

— Political candidate or politician

— Security forces

— Voters or party supporters

*Most-representative aggregation:* The mode reported target, including empty strings (`""`). For multi-modal events, the target is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-informative aggregation:* The mode reported target, excluding empty strings (`""`). For multi-modal events, the target is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

*Most-conservative aggregation:* The mode reported target, including empty strings (`""`). For multi-modal events, the target is coded as the mode value with the highest value in `source_classification` and (if still multi-modal) the highest value in `certain`.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports by primary target"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = target, fill = target)) +
  scale_fill_manual(values = wes_palette("Zissou1", 15, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

### Best estimated number of deaths (`deaths_best`)

*Description:* A count variable that records the total number of estimated deaths during the event. Non-numerical estimates (such as "several", "many,'' and "dozens'') were converted into numerical estimates using an appended version of the UCDP vague numbers translator.

*Values:* Numerical.

*Most-representative aggregation:* Lowest mode value.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Density of the number of deaths per event report"

maverick %>% 
  ggplot() +
  geom_density(aes(x = deaths_best)) +
  scale_x_continuous(
    trans = pseudo_log_trans(base = 10),
    labels = label_number(),
    breaks = c(0, 10, 25, 50, 100, 250)
  ) +
  labs(
    x = "Number of deaths",
    y = "Density"
  ) +
  my_theme
```

### Lowest estimated number of deaths (`deaths_low`)

*Description:* A count variable that records the lowest total number of estimated deaths during the event. When multiple figures were available in a single event report, this figure represents the lowest reported number.

*Values:* Numerical.

*Most-representative aggregation:* Lowest mode value.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

### Highest estimated number of deaths (`deaths_high`)

*Description:* A count variable that records the highest total number of estimated deaths during the event. When multiple figures were available in a single event report, this figure represents the highest reported number.

*Values:* Numerical.

*Most-representative aggregation:* Lowest mode value.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

### Best estimated number of injuries (`injuries_best`)

*Description:* A count variable that records the total number of estimated injured people during the event. Non-numerical estimates (such as "several", "many,'' and "dozens'') were converted into numerical estimates using an appended version of the UCDP vague numbers translator.

*Values:* Numerical.

*Most-representative aggregation:* Lowest mode value.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Density of the number of injured people per event report"

maverick %>% 
  ggplot() +
  geom_density(aes(x = injuries_best)) +
  scale_x_continuous(
    trans = pseudo_log_trans(base = 10),
    labels = label_number(),
    breaks = c(0, 10, 25, 50, 100, 250)
  ) +
  labs(
    x = "Number of injured people",
    y = "Density"
  ) +
  my_theme
```

### Lowest estimated number of injuries (`injuries_low`)

*Description:* A count variable that records the lowest total number of estimated injured people during the event. When multiple figures were available in a single event report, this figure represents the lowest reported number.

*Values:* Numerical.

*Most-representative aggregation:* Lowest mode value.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

### Highest estimated number of injuries (`injuries_high`)

A count variable that records the highest total number of estimated injured people during the event. When multiple figures were available in a single event report, this figure represents the highest reported number.

*Values:* Numerical.

*Most-representative aggregation:* Lowest mode value.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

::: callout-note
## MAVERICK vague numbers translator

**a couple:** 2

**multiple:** 2

**many:** 3 (indicates more than one was killed)

**some:** 2 (indicates more than one was killed)

**a few:** 2 (indicates that more than one was killed, and following the reasoning that they would say “a dozen” if there were)

**numerous:** 2 (indicates more than one was killed)

**several:** 3 best, 3 high, 2 low (indicates more than one was killed and an uncertainty in how many that was killed)

**group:** 2-3 (report how you coded and why)

**casualties:** 2 (make sure that “casualties” only refers to deaths and does not include wounded)

**heavy casualties:** 3 (make sure that “casualties” only refers to deaths and does not include wounded)

**over a dozen:** 13 (more than one dozen, i.e. 12)

**dozens:** 24 (with the reasoning that using plural indicates that it is at least 2 dozens)

**a score:** 20

**scores:** 40 (plural of score)\
\
**over a hundred:** 101

**hundreds:** 200 (reasoning that it is plural, i.e. at least 2x100)

**over a thousand:** 1001

**thousands:** 2000 (given that it is plural and thus at least 2x1000)

***up to*** **X:** X high, X – 10 best, X – 10 low

***close to*** **X:** X-1 high, X-10 best, X-10 low

***approximately*** **X:** 10% rule =\> X+10% high, X best, X-10% low

***at least*** **X:** X low, X best, X high (MAVERICK specific, added after discussion with UCDP)

***nearly*** **X:** X-1 high, X-1, best, X-10% low

***A said*** **X*, and B said* Y (where Y is higher than X):** X best, X low, Y high

**killed or injured/captured:** 0 on BHL (could be that none was killed)

**killed and injured/captured:** 1 on BHL (at least 1 was killed)

**neutralized:** 0 on BHL, unless it is commonly used as a way of expressing number of killed in that country (depends on the context, if other sources confirm the figures etc)
:::

### Displacement (`displacement`)

*Description:* A dichotomous variable that indicates whether the event resulted in people being displaced from their homes.

*Values:* 0 = No displacement reported, 1 = Displacement reported.

*Most-representative aggregation:* Mode value. For multi-modal events, displacement is coded as 1.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports recording displacement"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = as.factor(displacement), fill = as.factor(displacement))) +
  scale_fill_manual(values = wes_palette("Zissou1", 2, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

### Material destruction (`damage`)

*Description:* A dichotomous variable that indicates whether the event resulted in property or infrastructure damage.

*Values:* 0 = No property or infrastructure damage reported, 1 = Property or infrastructure damage reported.

*Most-representative aggregation:* Mode value. For multi-modal events, property destruction is coded as 1.

*Most-informative aggregation:* Highest value.

*Most-conservative aggregation:* Lowest value.

```{r}
#| echo: false
#| fig.align: center
#| fig.cap: "Number of event reports recording material destruction"

maverick %>% 
  ggplot() +
  geom_bar(aes(x = as.factor(damage), fill = as.factor(damage))) +
  scale_fill_manual(values = wes_palette("Zissou1", 2, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of event reports"
  ) +
  guides(fill = "none") +
  my_theme
```

## Source variables

### Sources (`source`)

*Description:* A reference to the source that produced the event report. Constructed as \[author\] (\[publication date)\] \[title\].

*Values:* A character string.

*Aggregations:* A character string consisting of all sources used to aggregate the event.

### Source author (`source_author`)

*Description:* The author of the source. Only available in the event report level dataset.

*Values:* A character string.

### Type of source (`source_type`)

*Description:* A categorical variable that indicates the type of event report source. Only available in the event report level dataset.

*Values:*

— Human rights commission

— International civil society organization

— International news agency

— International news media

— International organization

— National civil society organization

— National government agency

— National news media

— Special commission

— Other

### Number of sources (`number_of_sources`)

*Description:* The number of sources upon which the event coding is based. Only available in the event level dataset.

*Values:* Numerical.

### Aggregation set (`aggregation`)

*Description:* A categorical variable that indicates which aggregation rule set was used to produce the event level coding. Only available in the event level dataset.

*Values:* Most-representative, Most-informative, Most-conservative

# Limitations and recommendations

Like any dataset, MAVERICK is designed for a particular purpose and comes with both strengths and weaknesses. Below, we therefore outline a number of limitations and recommendations for prospective users.

A first limitation is that MAVERICK may not always provide an accurate count of the number of electoral violence deaths and injuries *in the aggregate*. MAVERICK is designed to be an event dataset, meaning that it only includes information on electoral violence whenever the reporting was sufficiently precise to place the incident in a specific location and during a limited time period (even if the exact location and timing was missing from the report). Consequently, we did not record general counts at the country or region level, such as "Some 20 people were killed in electoral violence in Côte d'Ivoire in 2022." This decision means that the aggregate counts in MAVERICK (e.g. by election, country, or country-year) are best interpreted as the *total count in documented events* than as *total counts*. Thus, whenever using the MAVERICK data to provide aggregate counts, the user should remain cognizant that the counts may be lower than more aggregated estimates.

A second limitation is that MAVERICK likely suffers from a range of reporting biases [see e.g. @dawkins2020]. The event report level data structure is designed to help users systematically gauge and address reporting biases across sources reporting on the same event [@weidmanngeelmuydenrod2015], that is, observed reporting biases. Likewise, the inclusion of variables that record based on what inferential clues a particular event was included in the dataset can help users assess systematic differences across included events that fulfilled more or less inferential clues. For example, the maps in @fig-sources-civ and @fig-sources-kenya demonstrate that our broad sampling of different types of sources helps address some potential urban biases in news reporting [@kalyvas2004], as both human rights commissions and civil society organizations report on events in more and smaller localities than news media and agencies. Nevertheless, despite the strengths of the event report level structure, this data structure cannot assess unobserved reporting biases, that is, whether some types of events are systematically unreported in any source or if some types of events are systematically reported in a way where they are excluded from the dataset. Hence, while we have adopted a broad sampling approach and made it possible to investigate observable reporting biases, users should keep in mind that there may be remaining reporting biases.

```{r}
#| echo: false
#| fig.align: center
#| fig-cap: "Number of event reports by location and source type in Côte d'Ivoire"
#| label: fig-sources-civ
#| fig-width: 7
#| fig-height: 7

maverick %>% 
  filter(country == "Ivory Coast" & geo_precision > 2) %>% 
  group_by(latitude, longitude, source_type) %>% 
  summarize(count = sum(n()), .groups = "drop") %>% 
  ggplot() +
  geom_sf(data = shapefile_civ, linewidth = 0.15, fill = "grey97") +
  coord_sf(xlim = c(-9, -2), ylim = c(4, 11), expand = FALSE) +
  geom_point(aes(x = longitude, y = latitude, size = count, fill = source_type), shape = 21, alpha = 0.9, color = "black") +
  scale_fill_manual(
    values = wes_palette("Zissou1", 8, type = "continuous"),
    name = "Source type",
    guide = guide_legend(override.aes = list(size = 5))
  ) +
  scale_size(name = "Number of event reports", range = c(2, 15)) +
  theme_void() +
  theme(
    legend.title = element_text(size = 8), 
    legend.text = element_text(size = 8)
  )
```

```{r}
#| echo: false
#| fig.align: center
#| fig-cap: "Number of event reports by location and source type in Kenya"
#| label: fig-sources-kenya
#| fig-width: 7
#| fig-height: 7

maverick %>% 
  filter(country == "Kenya" & geo_precision > 2) %>% 
  group_by(latitude, longitude, source_type) %>% 
  summarize(count = sum(n()), .groups = "drop") %>% 
  ggplot() +
  geom_sf(data = shapefile_ke, linewidth = 0.15, fill = "grey97") +
  coord_sf(xlim = c(33, 43), ylim = c(-5, 6), expand = FALSE) +
  geom_point(aes(x = longitude, y = latitude, size = count, fill = source_type), shape = 21, alpha = 0.9, color = "black") +
  scale_fill_manual(
    values = wes_palette("Zissou1", 9, type = "continuous"),
    name = "Source type",
    guide = guide_legend(override.aes = list(size = 5))
  ) +
  scale_size(name = "Number of event reports", range = c(2, 15)) +
  theme_void() +
  theme(
    legend.title = element_text(size = 8), 
    legend.text = element_text(size = 8)
  )
```

```{=latex}
\clearpage
```

# Working with the MAVERICK data

The MAVERICK data structure offers substantial detail, transparency, and customization, yet at the expense of user-friendliness. We therefore offer several examples of how to work with the data in `R` to make the dataset easier to work with. In addition, we provide the `eventreport` `R` package, which provides convenience functions for loading the data, customizing the aggregation procedure, and calculating regression diagnostics. The `eventreport` package will be made available via CRAN, and can be installed using the below `R` code:

```{r, echo=TRUE, eval=FALSE}

# Install the package from CRAN
install.packages("eventreport")

# Load the package
library(eventreport)
```

## Working with the actor sets

Each event report and event observation has six sets of actor variables. As noted above, the order is not significant, and actors can be recorded in any of the six actor sets, even when earlier actor sets are empty. Thus, whenever calculating aggregate figures on actor involvement, the user needs to make use of all six relevant variables. For example, in order to gauge the number of events involving the police as an initiator, perpetrator, or intervener (but not as an active bystander), the user needs to check all six actor `actor*_subtype` variables and all six `actor*_bystander` variables. For example:

```{r}
police <- maverick_event_representative %>% 
  # Create a binary police involvement variable using all actor variables
  # while excluding events where the police were passive bystanders
  mutate(
    police = case_when(
      actor1_subtype == "Security forces: Police" & actor1_bystander != 1 ~ 1,
      actor2_subtype == "Security forces: Police" & actor2_bystander != 1 ~ 1,
      actor3_subtype == "Security forces: Police" & actor3_bystander != 1 ~ 1,
      actor4_subtype == "Security forces: Police" & actor4_bystander != 1 ~ 1,
      actor5_subtype == "Security forces: Police" & actor5_bystander != 1 ~ 1,
      actor6_subtype == "Security forces: Police" & actor6_bystander != 1 ~ 1,
      TRUE ~ 0
    ),
  ) %>% 
  # Calculate the share of events involving police
  summarize(police = round(mean(police) * 100, 0)) %>% 
  pull(police)

print(police)
```

Based on the above code, we can conclude that the police were involved as an active participant in `r police`% of all electoral violence events in the dataset.

Users can also restructure the dataset so that it contains one observation for each actor involvement, for instance, to plot the number of actor involvements per country and actor type:

```{r}
maverick_event_representative %>% 
  # Select only relevant variables
  select(
    event_id, country, actor1_type, actor2_type, actor3_type,
    actor4_type, actor5_type, actor6_type
  ) %>% 
  # Use pivot longer to restructure so that each actor involvement is
  # a separate row while keeping the event_id and country variables
  pivot_longer(
    cols = c(-event_id, -country),
    names_to = "actor",
    values_to = "actor_type"
  ) %>% 
  # IMPORTANT: Exclude empty actor involvements
  filter(!is.na(actor_type)) %>% 
  # Plot the number of actor records by country and actor type
  ggplot() +
  geom_bar(aes(x = actor_type, fill = actor_type)) +
  facet_wrap(~ country, nrow = 2) +
  scale_fill_manual(values = wes_palette("Zissou1", 8, type = "continuous")) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Number of actor records"
  ) +
  guides(fill = "none") +
  my_theme
```

## Working with multiple aggregation sets

A strength of the event report structure in MAVERICK is that users can assess whether their results are robust across different aggregation rule sets, that is, that the findings do not depend on how the information contained in the event reports is combined to produce a single event coding. The `eventreport` package is designed to make it easy to work with multiple aggregation sets at once. To load all three pre-aggregated event level datasets and combine them into a single long dataframe, simply call:

```{r, echo=TRUE, eval=FALSE}

# Load datasets using the eventreport package
# Takes about a minute to run
maverick_representative <- aggregate_maverick_rep()
maverick_informative <- aggregate_maverick_inf()
maverick_conservative <- aggregate_maverick_con()

# Combine into a long dataset using rbind
maverick_combined <- rbind(
  maverick_representative, maverick_informative, maverick_conservative
)
```

## Compatibility with other actor-related datasets

There are many other datasets that collect data on political violence. Although some datasets ensure cross-dataset compatibility by including actor IDs from other datasets, this iteration of MAVERICK does facilitate such compatibility for two reasons. First, because the final list of included actors in particular events depends on the chosen aggregation set, adding actor IDs from other datasets was not deemed feasible. Second, most of the actors included in MAVERICK are small or loosely organized groups that do not feature in other datasets of organized violence, such as the UCDP or ACLED.

Nevertheless, users could easily generate a list of all actors included in a particular subset of MAVERICK and then assess whether the actors also feature in the dataset they are interested in merging with. The below code helps users compile a list of all actors in the MAVERICK event dataset, using the most-representative aggregation set:

```{r, echo=TRUE, eval=TRUE}

# Load the event level dataset using the eventreport package
maverick_representative <- aggregate_maverick_rep()

# Create long datasets that use the actor involvement as unit of analysis
maverick_long1 <- maverick %>% 
  select(
    id, actor1, actor2, actor3, actor4, actor5, actor6
  ) %>% 
  pivot_longer(
    cols = c(-id),
    names_to = "actor",
    values_to = "actor_name"
  )
  
maverick_long2 <- maverick %>% 
  select(
    id, actor1_id, actor2_id, actor3_id,
    actor4_id, actor5_id, actor6_id
  ) %>% 
  pivot_longer(
    cols = c(-id),
    names_to = "actor",
    values_to = "actor_id"
  ) %>% 
  mutate(actor = str_replace(actor, "_id", ""))

maverick_long3 <- maverick %>% 
  select(
    id, actor1_type, actor2_type, actor3_type,
    actor4_type, actor5_type, actor6_type
  ) %>% 
  pivot_longer(
    cols = c(-id),
    names_to = "actor",
    values_to = "actor_type"
  ) %>% 
  mutate(actor = str_replace(actor, "_type", ""))

maverick_long4 <- maverick %>% 
  select(
    id, actor1_subtype, actor2_subtype, actor3_subtype,
    actor4_subtype, actor5_subtype, actor6_subtype
  ) %>% 
  pivot_longer(
    cols = c(-id),
    names_to = "actor",
    values_to = "actor_subtype"
  ) %>% 
  mutate(actor = str_replace(actor, "_subtype", ""))

maverick_long5 <- maverick %>% 
  select(
    id, actor1, actor2, actor3,
    actor4, actor5, actor6
  ) %>% 
  pivot_longer(
    cols = c(-id),
    names_to = "actor",
    values_to = "dataset_label"
  ) %>% 
  mutate(actor = str_replace(actor, "_name", ""))

maverick_long6 <- maverick %>% 
  select(
    id, actor1_precision, actor2_precision, actor3_precision,
    actor4_precision, actor5_precision, actor6_precision
  ) %>% 
  pivot_longer(
    cols = c(-id),
    names_to = "actor",
    values_to = "precision"
  ) %>% 
  mutate(actor = str_replace(actor, "_precision", ""))

maverick_long <- maverick_long1 %>%
  inner_join(maverick_long2, by = c("id", "actor")) %>% 
  inner_join(maverick_long3, by = c("id", "actor")) %>% 
  inner_join(maverick_long4, by = c("id", "actor")) %>% 
  inner_join(maverick_long5, by = c("id", "actor")) %>% 
  inner_join(maverick_long6, by = c("id", "actor"))

maverick_long %>% 
  distinct(
    actor_name, actor_id, dataset_label,
    actor_type, actor_subtype, precision
  ) %>% 
  group_by(actor_id) %>% 
  summarize(
    actor_names = str_c(unique(actor_name), collapse = "; "),
    dataset_labels = str_c(unique(dataset_label), collapse = "; "),
    actor_type = str_c(unique(actor_type), collapse = "; "),
    actor_subtype = str_c(unique(actor_subtype), collapse = "; "),
    precision = str_c(unique(precision), collapse = "; "),
    .groups = 'drop'
  ) %>% 
  arrange(actor_id) %>% 
  select(
    actor_id, dataset_labels, actor_names,
    actor_type, actor_subtype, precision
  ) %>%
  head()
```

# Appendix A

| Source | Country |
|------------------------------------|------------------------------------|
| Amnesty International (1996, May 26) *Côte d'Ivoire: Government opponents are the target of systematic repression.* | Côte d'Ivoire |
| Human Rights Watch (2001, August) *The new racism: The political manipulation of ethnicity in Côte d'Ivoire.* | Côte d'Ivoire |
| International Crisis Group (2010, May 5) *Cote d'Ivoire : Securiser le processus electoral.* | Côte d'Ivoire |
| Mission d’Observation Electorale de l’Union Européenne (2010) *Mission d’Observation Electorale de l’Union Européenne.* | Côte d'Ivoire |
| Amnesty International (2011, May) *"They looked at his identity card and shot him dead": Six months of post-electoral violence in Côte d'Ivoire.* | Côte d'Ivoire |
| Amnesty International (2011, June) *"We want to go home, but we can't": Côte d'Ivoire's continuing crisis of displacement and insecurity*. | Côte d'Ivoire |
| The Carter Center (2011) *International Election Observation Mission to Côte d'Ivoire*. | Côte d'Ivoire |
| Human Rights Watch (2011, October) *"They killed them like it was nothing": The need for justice for Côte d'Ivoire's post-election crimes.* | Côte d'Ivoire |
| International Crisis Group (2011, March 3) *Côte d'Ivoire: Is war the only option?* | Côte d'Ivoire |
| International Crisis Group (2011, August 1) *A critical period for ensuring stability in Côte d'Ivoire* | Côte d'Ivoire |
| International Crisis Group (2011, December 16) *Côte d'Ivoire: Continuing the recovery.* | Côte d'Ivoire |
| Opération des Nations Unies en Côte d'Ivoire (2011, May 10) *Rapport sur les violations des droits de l'homme et du droit international humanitaire commises à l'Ouest de la Côte d'Ivoire.* | Côte d'Ivoire |
| Conseil National des Droits de l'Homme (2021) *Rapport monitoring des violences commises du 16 Septembre au 10 Novembre 2020.* | Côte d'Ivoire |
| International Crisis Group (2020, September 29) *Côte d'Ivoire: An election delay for dialogue.* | Côte d'Ivoire |
| Ministère de la Justice et des Droits de l'Homme (2021) *Rapport d'activité de l'unité spéciale d'enquête sur les évenements survenus à l'occasion de l'élection du Président de la République du 31 Octobre 2020.* | Côte d'Ivoire |
| Human Rights Watch (1993, November) *Divide and rule: State-sponsored ethnic violence in Kenya*. | Kenya |
| Amnesty International (1998, June 10) *Kenya: Political violence spirals.* | Kenya |
| Article 19 (1998, December) *Kenya: Post-election political violence.* | Kenya |
| Human Rights Watch (2002, May) *Playing with fire: Weapons proliferation, political violence, and human rights in Kenya* | Kenya |
| The Carter Center (2003, May) *Observing the 2002 Kenya Elections: Final Report.* | Kenya |
| Human Rights Watch (2008, March) *Ballots to bullets: Organized political violence and Kenya's crisis of governance.* | Kenya |
| Human Rights Watch (2008, July) *"All the men have gone": War crimes in Kenya's Mt. Elgon conflict.* | Kenya |
| United Nations High Commissioner for Human Rights (2008) *Report from OHCHR fact-finding mission to Kenya, 6-28 February 2008.* | Kenya |
| The Commission of Inquiry on Post Election Violence (2008, October). | Kenya |
| Human Rights Watch (2013, February) *High stakes: Political violence and the 2013 elections in Kenya.* | Kenya |
| International Crisis Group (2013, January) *Kenya's 2013 elections.* | Kenya |
| International Crisis Group (2013, May) *Kenya after the elections.* | Kenya |
| Kenya National Commission on Human Rights (2013) *2013 elections: Safeguarding rights.* | Kenya |
| Kenya National Commission on Human Rights (2013) *Break from the past?* | Kenya |
| Human Rights Watch (2014, April) *"We were sent to kill you": Gang attacks in Western Kenya and the government's failed response.* | Kenya |
| Amnesty International and Human Rights Watch (2017, October) *"Kill those criminals": Security forces violations in Kenya's August 2017 elections.* | Kenya |
| Amnesty International (2017, August 9) *Kenyan police must guard against using excessive force during elections.* | Kenya |
| Amnesty International (2017, August 9) *Kenya: Police must only use force as a last resort in dealing with protests.* | Kenya |
| Amnesty International (2017, August 12) *Kenya: Investigate police killings of pro-opposition protesters.* | Kenya |
| Amnesty International (2017, October 12) *Kenya: Ban on demonstrations must not legitimize police crackdowns.* | Kenya |
| Amnesty International (2017, October 30) *Kenya: Violence, killings and intimidation amid election chaos.* | Kenya |
| Amnesty International (2017, November 17) *Kenya: Police must not use lethal force against opposition supporters.* | Kenya |
| Committee to Protect Journalists (2017, August 7) *Amid tensions ahead of Kenyan vote, journalists face violence and threats.* | Kenya |
| Committee to Protect Journalists (2017, August 17) *Kenyan journalists harassed, detained reporting on election violence.* | Kenya |
| European Union Election Observation Mission (2018, January) *Final report, General Elections 2017.* | Kenya |
| Human Rights Watch and Article 19 (2017, May) *"Not worth the risk": Threats to free expression ahead of Kenya's 2017 elections.* | Kenya |
| Human Rights Watch (2017, December) *"They were men in uniform": Sexual violence against women and girls in Kenya's 2017 elections.* | Kenya |
| Kenya National Commission on Human Rights (2018) *Silhouettes of brutality: An account of sexual violence during and after the 2017 general election.* | Kenya |
| Kenya National Commission on Human Rights (2018) *Still a mirage at dusk: A human rights account of the 2017 fresh presidential elections.* | Kenya |
| Kenya National Commission on Human Rights (2018, April) *The fallacious vote.* | Kenya |
| International Crisis Group (2022, June 9) *Kenya's 2022 election: High stakes.* | Kenya |
| International Crisis Group (2023, April 20) *Absorbing climate shocks and easing conflict in Kenya's Rift Valley.* | Kenya |
| Kenya National Commission on Human Rights (2022) *2022 general elections: From pre-poll to post-poll.* | Kenya |
| Kenya National Commission on Human Rights (2022) *Demystifying our democracy: Towards a human rights compliant elections.* | Kenya |
| Kenya National Commission on Human Rights (2022) *The bound ballot: A human rights account of the 2022 political party nomination exercise.* | Kenya |

: Pre-selected secondary sources

# References {.unnumbered}
